Support PaddlePaddle with compatible API by SigureMo · Pull Request #1642 · flashinfer-ai/flashinfer

SigureMo · 2025-09-05T09:10:38Z

We are PaddlePaddle contributors working on a PyTorch compatibility layer aimed at making it significantly easier for PyTorch ecosystem libraries to run on Paddle. See context: #1563

Summary

This PR introduces a minimal, opt-in compatibility path so third-party projects such as flashinfer can be used with Paddle with very small changes.
The approach is intentionally minimal and opt-in to avoid breaking upstream behavior for existing PyTorch users.

Design

C++ / CUDA layer: provide an adapter that is fully compatible with the PyTorch C API surface (ATen / c10 / torch)¹. This allows third-party libraries that call into PyTorch C++/CUDA APIs to instead invoke Paddle's C++/CUDA implementation via the adapter.
Python layer: reorganize a small compatibility layer so that Paddle's Python API matches PyTorch's API shape as closely as possible (we avoid reproducing PyTorch-specific internals like TorchVersion). The goal is that Python code can do import paddle as torch and run with minimal or no source changes.
Import proxy: provide paddle.compat.enable_torch_proxy()² which makes import torch actually load paddle. This removes the need for import paddle as torch in most cases and keeps changes non-invasive.

Usage (example)

Install (build with compatibility enabled)

PADDLE_COMPATIBLE_API=1 pip install -v --no-build-isolation .

Runtime example

# example.py
import paddle

paddle.compat.enable_torch_proxy()  # enable proxy before import torch

import flashinfer

# use ops in flashinfer ...

PADDLE_COMPATIBLE_API=1 python example.py

Why this is opt-in

We added a simple check for the environment variable PADDLE_COMPATIBLE_API. When set, the compatibility hooks and small source adjustments are enabled. This keeps the default behavior unchanged for regular PyTorch or Paddle users.

Small changes requested in flashinfer

JIT-related logic: some JIT code in flashinfer assumes PyTorch's directory layout. We do not aim to mirror directory structure 1:1. We request a small refactor to decouple the logic from the exact torch package file layout (make paths configurable or resolve modules by import names).
setup.py / AOT build: during AOT compilation setup.py currently does import torch. For compatibility builds we need the build to perform paddle.compat.enable_torch_proxy() early (before import torch), or otherwise provide a small hook so the build imports load paddle instead.
To make these changes backward-compatible, our patch adds an environment-variable driven path (PADDLE_COMPATIBLE_API) inside flashinfer; if present, we enable the compatibility adjustments only in that mode.

Would these minimal, environment-gated changes be acceptable to the flashinfer maintainers?

What we tested

We tested the flashinfer.fused_moe.cutlass_fused_moe interface. With the compatibility mode enabled and some additional Python-side compatibility work in progress, we successfully ran fp16 unit tests for that interface.
We plan to incrementally expand test coverage and run more of flashinfer's unit tests in CI as part of further collaboration.

Next steps (proposed)

If you’re open to collaboration, we can:
- Open a PR with the smallest possible changes to flashinfer (clearly marked and gated).
- Add flashinfer tests into PaddlePaddle's CE system (run daily under PADDLE_COMPATIBLE_API=1) and gradually increase coverage.

Thank you for reviewing this PR — we welcome your feedback on the minimal integration approach and are ready to iterate on the branch or make any changes you prefer.

gemini-code-assist

Summary of Changes

Hello @SigureMo, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a strategic compatibility layer to bridge PyTorch-dependent projects with PaddlePaddle. Its primary goal is to facilitate the adoption of libraries like FlashInfer within the PaddlePaddle ecosystem by providing a seamless, opt-in mechanism that intelligently adapts build processes and API calls without disrupting existing workflows. The changes are designed to be non-invasive and environment-gated, ensuring flexibility and stability for both PyTorch and PaddlePaddle users.

Highlights

PaddlePaddle Compatibility: Introduces a minimal, opt-in compatibility layer to enable PyTorch ecosystem libraries, such as FlashInfer, to run on PaddlePaddle with minimal code changes.
Design Approach: The compatibility is achieved through a C++/CUDA layer adapter for PyTorch's C API surface (ATen/c10/torch), a Python layer to match PyTorch's API shape (allowing import paddle as torch), and an import proxy (paddle.compat.enable_torch_proxy()) that makes import torch load paddle.
Opt-in Mechanism: The compatibility features are activated by setting the PADDLE_COMPATIBLE_API environment variable, ensuring that default behavior remains unchanged for regular PyTorch or Paddle users.
Build System Adjustments: Modifies the JIT compilation logic and setup.py to conditionally include PaddlePaddle-specific include paths and linker flags, or PyTorch ones, based on the compatibility mode.
Initial Testing: Successfully tested the flashinfer.fused_moe.cutlass_fused_moe interface with fp16 unit tests under the new compatibility mode.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a compatibility layer for PaddlePaddle, allowing third-party PyTorch ecosystem libraries to run on Paddle with minimal changes. The changes involve adapting C++/CUDA APIs and reorganizing the Python API to match PyTorch's structure. The integration is opt-in, controlled by the PADDLE_COMPATIBLE_API environment variable. The review focuses on correctness and maintainability, particularly concerning the conditional logic for PaddlePaddle compatibility and the modifications to the build process.

gemini-code-assist · 2025-09-05T09:12:06Z

 if enable_aot:
+    if use_paddle_compatible_api():
+        import paddle


Consider adding a check to ensure paddle is importable before calling paddle.compat.enable_torch_proxy(). If paddle is not installed or available in the environment, this could lead to an ImportError and break the build process. A try...except block can be used to handle this scenario gracefully.

Suggested change

if enable_aot:

if use_paddle_compatible_api():

import paddle

if use_paddle_compatible_api():

try:

import paddle

paddle.compat.enable_torch_proxy()

except ImportError:

print("PaddlePaddle is not installed. Skipping paddle.compat.enable_torch_proxy().")

gemini-code-assist · 2025-09-05T09:12:06Z

+    if use_paddle_compatible_api():
+        system_includes.extend(
+            [
+                "$torch_home/include",
+                "$torch_home/include/torch/csrc/api/include",
+            ]
+        )


The system_includes.extend method is called within the if block, but the system_includes list already contains some default paths. This could lead to duplicate include paths if use_paddle_compatible_api() returns True, potentially causing issues during compilation. Consider adding the default paths within the else block to avoid duplication.

Alternatively, you can initialize system_includes as an empty list and populate it entirely within the if and else blocks to ensure no overlap.

system_includes = [ sysconfig.get_path("include"), "$cuda_home/include", jit_env.FLASHINFER_INCLUDE_DIR.resolve(), jit_env.FLASHINFER_CSRC_DIR.resolve(), ] if use_paddle_compatible_api(): system_includes.extend( [ "$torch_home/include", "$torch_home/include/torch/csrc/api/include", ] ) else: system_includes.extend( [ "$torch_home/include", "$torch_home/include/paddle/phi/api/include/compat", "$torch_home/include/paddle/phi/api/include/compat/torch/csrc/api/include", ] )

gemini-code-assist · 2025-09-05T09:12:06Z

    ldflags = [
        "-shared",
-        "-L$torch_home/lib",
-        "-L$cuda_home/lib64",
-        "-lc10",
-        "-lc10_cuda",
-        "-ltorch_cpu",
-        "-ltorch_cuda",
-        "-ltorch",
        "-lcudart",
    ]


The ldflags list is initialized with "-shared" and "-lcudart" regardless of the use_paddle_compatible_api() condition. This could lead to redundancy or conflicts if the subsequent extend calls also include -shared or -lcudart. Consider initializing ldflags as an empty list and adding these flags conditionally within the if and else blocks to avoid potential issues.

ldflags = [] if use_paddle_compatible_api(): ldflags.extend( [ "-shared", "-L$torch_home/lib", "-L$cuda_home/lib64", "-lc10", "-lc10_cuda", "-ltorch_cpu", "-ltorch_cuda", "-ltorch", "-lcudart", ] ) else: ldflags.extend( [ "-shared", "-L$torch_home/libs", "-L$torch_home/base", "-L$cuda_home/lib64", "-lpaddle", "-lphi", "-lphi_core", "-lphi_gpu", "-lcommon", "-lcudart", ] )

gemini-code-assist · 2025-09-05T09:12:06Z

+    return flashinfer.jit.gen_jit_spec(
        "logging",
        [
-            jit_env.FLASHINFER_CSRC_DIR / "logging.cc",
+            flashinfer.jit.env.FLASHINFER_CSRC_DIR / "logging.cc",
        ],
        extra_include_paths=[
-            jit_env.SPDLOG_INCLUDE_DIR,
-            jit_env.FLASHINFER_INCLUDE_DIR,
+            flashinfer.jit.env.SPDLOG_INCLUDE_DIR,
+            flashinfer.jit.env.FLASHINFER_INCLUDE_DIR,
        ],
    ).build_and_load()


Consider using pathlib.Path.joinpath instead of / for constructing paths. This is more platform-independent and readable. For example, flashinfer.jit.env.FLASHINFER_CSRC_DIR.joinpath("logging.cc").

Suggested change

return flashinfer.jit.gen_jit_spec(

"logging",

[

jit_env.FLASHINFER_CSRC_DIR / "logging.cc",

flashinfer.jit.env.FLASHINFER_CSRC_DIR / "logging.cc",

],

extra_include_paths=[

jit_env.SPDLOG_INCLUDE_DIR,

jit_env.FLASHINFER_INCLUDE_DIR,

flashinfer.jit.env.SPDLOG_INCLUDE_DIR,

flashinfer.jit.env.FLASHINFER_INCLUDE_DIR,

],

).build_and_load()

return flashinfer.jit.gen_jit_spec(

"logging",

[

flashinfer.jit.env.FLASHINFER_CSRC_DIR.joinpath("logging.cc"),

],

extra_include_paths=[

flashinfer.jit.env.SPDLOG_INCLUDE_DIR,

flashinfer.jit.env.FLASHINFER_INCLUDE_DIR,

],

).build_and_load()

yzh119 · 2025-09-05T16:32:46Z

Hi @SigureMo we plan to go with tvm-ffi to replace the current PyTorch bindings (wip in #1641), it should satisfy what paddle need (can you double checking it?)

Cc @tqchen

SigureMo · 2025-09-07T20:47:37Z

Hi @yzh119 — thanks for the work here! I only learned about the recent TVM FFI efforts in the last couple of days. TVM FFI is indeed an excellent FFI solution for the ML systems — thanks to you and @tqchen for driving this.

From what I see, TVM FFI can nicely decouple flashinfer from PyTorch by providing a framework-agnostic binding layer, which aligns well with the goals of our compatibility approach. This should remove a lot of pain points in our custom C++ operator ecosystem — at minimum we wouldn’t need to worry about C++ ABI/operator registration compatibility anymore. I did a quick look into the implementation and it seems we would likely only need a small adaptation for CUDA stream handling in TVM FFI (see: https://github.com/apache/tvm/blob/a819115375568e52f9d2d7376cdbb0a23346c3cb/ffi/python/tvm_ffi/cython/function.pxi#L110-L124). So I’m looking forward to your refactor.

Separately, TVM FFI as a more general, framework-agnostic custom-op solution opens up additional possibilities and could offer more options for our ecosystem compatibility strategy. Do you have any plans to promote or adopt TVM FFI in projects beyond flashinfer? If so, that could help more custom-op projects decouple from the PyTorch ecosystem and move toward a framework-agnostic custom-op ecosystem. @tqchen

tqchen · 2025-09-07T22:10:45Z

thanks @SigureMo , yes, we do plan to bring tvm ffi as an independent project that benefit all, we are still at the bring up stage so didn't communicate broadly, but yes the goal is to make it a general project that can be used across all deep learning frameworks, compilers, and libraries

yzh119 · 2025-09-27T16:58:39Z

#1641 is merged, @SigureMo would you mind checking whether it's helpful for paddle compatibility?

SigureMo · 2025-09-27T18:38:25Z

#1641 is merged, @SigureMo would you mind checking whether it's helpful for paddle compatibility?

@yzh119 Thanks for the work on #1641! I can confirm the C++ layer no longer depends on PyTorch after that change, which removes the adapter maintenance we were carrying on our side—really appreciate it.

I did notice the Python JIT workflow still references torch headers and some torch-specific compile flags.

flashinfer/flashinfer/jit/cpp_ext.py

Lines 98 to 119 in 08b8da3

    
           system_includes = [ 
        
               sysconfig.get_path("include"), 
        
               "$torch_home/include", 
        
               "$torch_home/include/torch/csrc/api/include", 
        
               "$cuda_home/include", 
        
               "$cuda_home/include/cccl", 
        
               tvm_ffi.libinfo.find_include_path(), 
        
               tvm_ffi.libinfo.find_dlpack_include_path(), 
        
               jit_env.FLASHINFER_INCLUDE_DIR.resolve(), 
        
               jit_env.FLASHINFER_CSRC_DIR.resolve(), 
        
           ] 
        
           system_includes += [p.resolve() for p in jit_env.CUTLASS_INCLUDE_DIRS] 
        
           system_includes.append(jit_env.SPDLOG_INCLUDE_DIR.resolve()) 
        
           common_cflags = [ 
        
               "-DTORCH_EXTENSION_NAME=$name", 
        
               "-DTORCH_API_INCLUDE_EXTENSION_H", 
        
           ] 
        
           if not sysconfig.get_config_var("Py_GIL_DISABLED"): 
        
               common_cflags.append("-DPy_LIMITED_API=0x03090000") 
        
           common_cflags += torch_get_pybind11_abi_build_flags() 
        
           common_cflags += _get_glibcxx_abi_build_flags()

Do you plan to remove those as well?

On the E2E validation: we already landed the Paddle prerequisites (PaddlePaddle/Paddle#75193 and PaddlePaddle/Paddle#75205), so I’m optimistic flashinfer will run on Paddle as smoothly as it does on PyTorch. I’ll run the verification soon—likely right after the holiday.

yzh119 · 2025-09-27T19:52:13Z

Do you plan to remove those as well?

Yes most of them are no longer required, updated in #1795

@MasterJH5574

## 📌 Description The codegen logic for pytorch and tvm should unify after #1641 , and this PR cleans up the related codegen functions in tvm_bindings. Other changes: 1. update tvm-ffi to 0.1.0b11 to incorporate apache/tvm-ffi#67 and apache/tvm-ffi#68 2. rename of source files: `_ops.cu` and `_pybind.cu` renamed to `_binding.cu` 3. remove torch related header include/library linking in ninja files (#1642 (comment)) 4. remove the use of `use_torch_stream` in unittests, they are no longer required after apache/tvm-ffi#68 ## 🔍 Related Issues #1641 ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [ ] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [ ] I have installed the hooks with `pre-commit install`. - [ ] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes cc @MasterJH5574 please let us know what changes do we need to make to help you bump to the latest version of flashinfer in MLC.

SigureMo added 5 commits September 4, 2025 04:08

Support PaddlePaddle with compatible API

975b86a

update setup.py

9bb59d4

rename install_torch_alias to enable_torch_proxy

ee20442

run pre-commit

40c40f1

resolve circular import

930cd62

gemini-code-assist bot reviewed Sep 5, 2025

View reviewed changes

run pre-commit

5d7c443

gemini-code-assist bot reviewed Sep 5, 2025

View reviewed changes

SigureMo added 2 commits September 5, 2025 09:13

Merge branch 'main' into support-paddlepaddle-with-compatible-api

22c4a5e

resolve conflict error

b7a8db4

SigureMo marked this pull request as draft September 7, 2025 20:48

SigureMo closed this Sep 17, 2025

yzh119 mentioned this pull request Sep 27, 2025

refactor: cleanup codebase after tvm-ffi refactor #1795

Merged

5 tasks

SigureMo mentioned this pull request Oct 22, 2025

Support PaddlePaddle with compatible API deepseek-ai/FlashMLA#115

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support PaddlePaddle with compatible API#1642

Support PaddlePaddle with compatible API#1642
SigureMo wants to merge 8 commits intoflashinfer-ai:mainfrom
PFCCLab:support-paddlepaddle-with-compatible-api

SigureMo commented Sep 5, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 5, 2025

Uh oh!

gemini-code-assist bot Sep 5, 2025

Uh oh!

gemini-code-assist bot Sep 5, 2025

Uh oh!

gemini-code-assist bot Sep 5, 2025

Uh oh!

yzh119 commented Sep 5, 2025

Uh oh!

SigureMo commented Sep 7, 2025

Uh oh!

tqchen commented Sep 7, 2025

Uh oh!

yzh119 commented Sep 27, 2025

Uh oh!

SigureMo commented Sep 27, 2025

Uh oh!

yzh119 commented Sep 27, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SigureMo commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

Usage (example)

Why this is opt-in

Small changes requested in flashinfer

What we tested

Next steps (proposed)

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

yzh119 commented Sep 5, 2025

Uh oh!

SigureMo commented Sep 7, 2025

Uh oh!

tqchen commented Sep 7, 2025

Uh oh!

yzh119 commented Sep 27, 2025

Uh oh!

SigureMo commented Sep 27, 2025

Uh oh!

yzh119 commented Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SigureMo commented Sep 5, 2025 •

edited

Loading

yzh119 commented Sep 27, 2025 •

edited

Loading